Tumour mutations in long noncoding RNAs enhance cell fitness

Long noncoding RNAs (lncRNAs) are linked to cancer via pathogenic changes in their expression levels. Yet, it remains unclear whether lncRNAs can also impact tumour cell fitness via function-altering somatic “driver” mutations. To search for such driver-lncRNAs, we here perform a genome-wide analysis of fitness-altering single nucleotide variants (SNVs) across a cohort of 2583 primary and 3527 metastatic tumours. The resulting 54 mutated and positively-selected lncRNAs are significantly enriched for previously-reported cancer genes and a range of clinical and genomic features. A number of these lncRNAs promote tumour cell proliferation when overexpressed in in vitro models. Our results also highlight a dense SNV hotspot in the widely-studied NEAT1 oncogene. To directly evaluate the functional significance of NEAT1 SNVs, we use in cellulo mutagenesis to introduce tumour-like mutations in the gene and observe a significant and reproducible increase in cell fitness, both in vitro and in a mouse model. Mechanistic studies reveal that SNVs remodel the NEAT1 ribonucleoprotein and boost subnuclear paraspeckles. In summary, this work demonstrates the utility of driver analysis for mapping cancer-promoting lncRNAs, and provides experimental evidence that somatic mutations can act through lncRNAs to enhance pathological cancer cell fitness.

The exact sample size (n) for each experimental group/condition, given as as a discrete number and unit of of measurement A statement on on whether measurements were taken from distinct samples or or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of of all covariates tested A description of of any assumptions or or corrections, such as as tests of of normality and adjustment for multiple comparisons A full description of of the statistical parameters including central tendency (e.g. means) or or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or or associated estimates of of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on on the choice of of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of of the appropriate level for tests and full reporting of of outcomes Estimates of of effect sizes (e.g. Cohen's d, Pearson's r), ), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or or software that are central to to the research but not yet described in in published literature, software must be be made available to to editors and reviewers. We We strongly encourage code deposition in in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Rory Johnson and Roberta Esposito
Apr 26, 2023 Custome code is is accessible at: https://github.com/gold-lab/ExInAtor2.git Additionally, the code has been deposited to to Zenodo (v1.0.0) [DOI: 10.5281/zenodo.7828265] and is is publicly available. Data and metadata were collected from International Cancer Genome Consortium (ICGC) consortium members using custom software packages designed by by the ICGC Data Coordinating Centre. The general-purpose core libraries and utilities underlying this software have been released under the GPLv3 open source license as as the "Overture" package and are available at at https://www.overture.bio. Other data collection software used in in this effort, such as as ICGC-specific portal user interfaces, are available upon request to to contact@overture.
Custome code is is accessible at: https://github.com/gold-lab/ExInAtor2.git The workflows executing core WGS alignment, QC QC and variant-calling software are packaged as as executable Dockstore images and available at: https://dockstore.org/search?labels.value.keyword=pcawg&searchMode=files. Additionally, for the data analysis we we used R version 4.1.1; Mass Spectrometry data was interpreted with MaxQuant (version 1.6.14.0).

April 2023
Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy

Research involving human participants, their data, or biological material
Policy information about studies with human participants or human data. See also policy information about sex, gender (identity/presentation), and sexual orientation and race, ethnicity and racism.

Reporting on sex and gender
Reporting on race, ethnicity, or other socially relevant groupings Population characteristics

Recruitment
Ethics oversight Note that full information on the approval of the study protocol must also be provided in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
Data availability -Somatic mutation data: 1) Mutation WGS somatic and germline variant calls, mutational signatures, subclonal reconstructions, transcript abundance, splice calls and other core data generated by the ICGC/TCGA Pan-cancer Analysis of Whole Genomes Consortium are available for download at https://dcc.icgc.org/releases/PCAWG. Additional information on accessing the data, including raw read files, can be found at https://docs.icgc.org/pcawg/data/. In accordance with the data access policies of the ICGC and TCGA projects, most molecular, clinical and specimen data are in an open tier which does not require access approval. To access potentially identification information, such as germline alleles and underlying sequencing data, researchers will need to apply to the TCGA Data Access Committee (DAC) via dbGaP (https:// dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login) for access to the TCGA portion of the dataset, and to the ICGC Data Access Compliance Office (DACO; http:// icgc.org/daco) for the ICGC portion. In addition, to access somatic single nucleotide variants derived from TCGA donors, researchers will also need to obtain dbGaP authorization.
2) HMF data can be requested at https://www.hartwigmedicalfoundation.nl/en/data/data-acces-request/ -Somatic mutation data and clinical data: 3) The Pan-cancer Analysis of Whole Genomes Consortium (PCAWG) publicly available data used in this study for survival analysis, mutual exclusivity and cooccurrence are available in the UCSC-Xenahub database [https://xenabrowser.net/datapages/?cohort=PCAWG%20 (donor%20centric) We do not use sex and gender data. We collected data exclusively about mutational status.
We do not use or collect data about ethnicity.
Describe the covariate-relevant population characteristics of the human research participants (e.g. age, genotypic information, past and current diagnosis and treatment categories). If you filled out the behavioural & social sciences study design questions and have nothing to add here, write "See above."

No recruitment was performed in this study
Human liver tissue was obtained from patients undergoing surgical resection for colorectal metastasis. Signed informed consent was obtained from all patients in accordance with institutional guidelines and according to study approval of the Ethics Commission of the Canton of Bern.
We compiled an inventory of matched tumour/normal whole cancer genomes in the ICGC Data Coordinating Centre. Most samples came from treatment-naïve, primary cancers, but there were a small number of donors with multiple samples of primary, metastatic and/or recurrent tumours. Our inclusion criteria were: (i) matched tumour and normal specimen pair; (ii) a minimal set of clinical fields; and (iii) characterisation of tumour and normal whole genomes using Illumina HiSeq paired-end sequencing reads. We collected genome data from 2,834 donors, representing all ICGC and TCGA donors that met these criteria at the time of the final data freeze in autumn 2014. Regarding the wet lab experiments, a minimum of N=3 biological independent experiments was performed Reporting for specific materials, systems and methods We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. After quality assurance, data from 176 donors were excluded as unusable. Reasons for data exclusions included inadequate coverage, extreme bias in coverage across the genome, evidence for contamination in samples and excessive sequencing errors (for example, through 8oxoguanine).
In order to evaluate the performance of each of the mutation-calling pipelines and determine an integration strategy, we performed a largescale deep sequencing validation experiment. We selected a pilot set of 63 representative tumour/normal pairs, on which we ran the three core pipelines, together with a set of 10 additional somatic variant-calling pipelines contributed by members of the SNV Calling Working Group. Overall, the sensitivity and precision of the consensus somatic variant calls were 95% (CI90%: 88-98%) and 95% (CI90%: 71-99%) respectively for SNVs. For somatic indels, sensitivity and precision were 60% (34-72%) and 91% (73-96%) respectively. Regarding SVs, we estimate the sensitivity of the merging algorithm to be 90% for true calls generated by any one caller; precision was estimated as 97.5% -that is, 97.5% of SVs in the merged SV call-set have an associated copy number change or balanced partner rearrangement. All the replication attempt were successful.
Simulations are performed by random repositioning of mutations in exonic regions, while maintaining identical trinucleotide content. For the vivo experiments, the mice were randomly allocated into the treatments groups (i.e. injection with cells previously mutated using sgRNAs targeting NEAT1 Region 2, NEAT1 Region 3 and negative control region) The technician performing in vivo experiments was blinded to group allocation, during data collection and analysis.
Antibodies used for immunoprecipitation: anti-SREK1 (Sigma, HPA037674) -2 µg anti-PQBP1 (Bethyl Laboratories, A302-802A) -2 µg Recombinant Rabbit IgG, monoclonal [EPR25A] -Isotype Control (Abcam, ab172730) -2 µg All the commercial antibodies are validated by the manufacturers or previous studies done by others or our laboratory HeLa, HEK293T and HCT116 were a kind gift from Roderic Guigo's lab (CRG, Barcelona). The MRC5-SV cells were provided by the group of Ronald Dijkmanthe (Institute of Virology and Immunology, University of Bern) and the HN5 tongue squamous cell carcinoma cells by Jeffrey E. Myers (MD Anderson) to Y. Zimmer. SNU-475 were purchased from ATCC (#crl-2236) HuH7 ere purchased from Cell Line Service (#300156) All the cell lines were authenticated using Short Tandem Repeat (STR) profiling (Microsynth Cell Line Typing) All the cell lines were tested negative for mycoplasma contamination.
No commonly misidentified cell lines were used in this study